Private Access to Phrase Tables for Statistical Machine Translation
نویسنده
چکیده
Some Statistical Machine Translation systems never see the light because the owner of the appropriate training data cannot release them, and the potential user of the system cannot disclose what should be translated. We propose a simple and practical encryption-based method addressing this barrier.
منابع مشابه
Selective Phrase Pair Extraction for Improved Statistical Machine Translation
Phrase-based statistical machine translation systems depend heavily on the knowledge represented in their phrase translation tables. However, the phrase pairs included in these tables are typically selected using simple heuristics that potentially leave much room for improvement. In this paper, we present a technique for selecting the phrase pairs to include in phrase translation tables based o...
متن کاملSampling Phrase Tables for the Moses Statistical Machine Translation System
The idea of virtual phrase tables for statistical machine translation (SMT) that construct phrase table entries on demand by sampling a fully indexed bitext was first proposed ten years ago by Callison-Burch et al. (2005). However, until recently (Germann, 2014) no working and practical implementation of this approach was available in the Moses SMT system. We describe and evaluate this implemen...
متن کاملStatistical machine translation using large j/e parallel corpus and long phrase tables
Our statistical machine translation system that uses large Japanese-English parallel sentences and long phrase tables is described. We collected 698,973 Japanese-English parallel sentences, and we used long phrase tables. Also, we utilized general tools for statistical machine translation, such as ”Giza++”[1], ”moses”[2], and ”training-phrasemodel.perl”[3]. We used these data and these tools, W...
متن کاملComplexity-Based Phrase-Table Filtering for Statistical Machine Translation
We describe an approach for filtering phrase tables in a Statistical Machine Translation system, which relies on a statistical independence measure called Noise, first introduced in (Moore, 2004). While previous work by (Johnson et al., 2007) also addressed the question of phrase table filtering, it relied on a simpler independence measure, the p-value, which is theoretically less satisfying th...
متن کاملExploiting Similarities among Languages for Machine Translation
Dictionaries and phrase tables are the basis of modern statistical machine translation systems. This paper develops a method that can automate the process of generating and extending dictionaries and phrase tables. Our method can translate missing word and phrase entries by learning language structures based on large monolingual data and mapping between languages from small bilingual data. It u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012